Structure-Based Whole Genome Realignment Reveals Many Novel Non-coding RNAs
نویسندگان
چکیده
Recent genome-wide computational screens that search for conservation of RNA secondary structure in whole-genome alignments (WGAs) have predicted thousands of structural noncoding RNAs (ncRNAs). The sensitivity of such approaches, however, is limited, due to their reliance on sequence-based whole-genome aligners, which regularly misalign structural ncRNAs. This suggests that many more structural ncRNAs may remain undetected. Structure-based alignment, which could increase the sensitivity, has been prohibitive for genome-wide screens due to its extreme computational costs. Breaking this barrier, we present the pipeline REAPR (RE-Alignment for Prediction of structural ncRNA), which efficiently realigns whole genomes based on RNA sequence and structure, thus allowing us to boost the performance of de novo ncRNA predictors, such as RNAz. Key to the pipeline's efficiency is the development of a novel banding technique for multiple RNA alignment. REAPR significantly outperforms the widely used predictors RNAz and EvoFold in genome-wide screens; in direct comparison to the most recent RNAz screen on D. melanogaster, REAPR predicts twice as many high-confidence ncRNA candidates. Moreover, modENCODE RNA-seq experiments confirm a substantial number of its predictions as transcripts. REAPR's advancement of de novo structural characterization of ncRNAs complements the identification of transcripts from rapidly accumulating RNA-seq data.
منابع مشابه
Publications of Sebastian Will
[2] Sebastian Will 1 and Hosna Jabbari. Sparse RNA folding revisited: space-efficient minimum free energy structure prediction. quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Local exact pattern matching for non-fixed RNA structures. Structure-based whole genome realignment reveals many novel non-coding RNAs. CRISPRmap: an automated classification o...
متن کاملStructure - based Realignment of Non - coding RNAs in Multiple Whole Genome Alignments
Whole genome alignments have become a central tool in biological sequence analysis. A major application is the de novo prediction of non-coding RNAs (ncRNAs) from structural conservation visible in the alignment. However, current methods for constructing genome alignments do so by explicitly optimizing for sequence similarity but not structural similarity. Therefore, de novo prediction of ncRNA...
متن کاملPublications of Sebastian Will Journal Articles
quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Local exact pattern matching for non-fixed RNA structures. Structure-based whole genome realignment reveals many novel non-coding RNAs. CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems. Incorporating thermodynamic stability in sequence and structure-ba...
متن کاملThe Roles of Long non-coding RNAs (lncRNA) in Prostate Cancer
Background & Objective: Prostate cancer is a compound condition in which gene expression has altered. Several surveys have revealed that genetic components have been involved in prostate cancer progression. Findings proposed that they can modify a noteworthy portion of disposing of elements, which is associated to the developing prostate cancer in protein coding sequences. The purpose of this r...
متن کاملLong non-coding RNAs and their significance in human diseases
Protein-coding genes account for only a small fraction of the human genome and most of the genomic sequences are transcriptionally silent, but recent observations indicate significant functional elements, including non-coding protein transcripts in the human genome. Long non-coding RNAs (lncRNAs) have been defined as transcripts of >200 nucleotides without protein-coding capacity that perform t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genome research
دوره 23 6 شماره
صفحات -
تاریخ انتشار 2012